[Python] Best strategy for dealing with incomplete lines of data from a file.

Posted by adoran on Stack Overflow See other posts from Stack Overflow or by adoran
Published on 2010-06-16T14:12:13Z Indexed on 2010/06/16 14:22 UTC
Read the original article Hit count: 117

Filed under:
|
|
|
|

I use the following block of code to read lines out of a file 'f' into a nested list:

for data in f:
     clean_data = data.rstrip()
     data = clean_data.split('\t') 
     t += [data[0]]
     strmat += [data[1:]]

Sometimes, however, the data is incomplete and a row may look like this:

['955.159', '62.8168', '', '', '', '', '', '', '', '', '', '', '', '', '', '29', '30', '0', '0']

It puts a spanner in the works because I would like Python to implicitly cast my list as floats but the empty fields '' cause it to be cast as an array of strings (dtype: s12).

I could start a second 'if' statement and convert all empty fields into NULL (since 0 is wrong in this instance) but I was unsure whether this was best.

  1. Is this the best strategy of dealing with incomplete data?
  2. Should I edit the stream or do it post-hoc?

© Stack Overflow or respective owner

Related posts about python

Related posts about arrays